co 



a 

-6 



o 



Interdisciplinary Information Sciences Vol. XX, No. XX (20YY)l-5 
©Graduate School of Information Sciences, Tohoku University 
ISSN 1340-9050 print/1347-6157 online 
DOI 10.4036/iis.20YY.??? 

Statistical Mechanical Formulation and Simulation of Prime 

Factorization of Integers 

Chihiro H Nakajima* 
Department of Physics, Kyushu University, 33 Fukuoka 812-8581, Japan 



We propose a new formulation of the problem of prime factorization of integers. With replica exchange Monte 
Carlo simulation, the behavior which is seemed to indicate exponential computational hardness is observed. But this 
5_j ' formulation is expected to give a new insight into the computational complexity of this problem from a statistical 

C^ , mechanical point of view. 
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S ; 1 Introduction 

C 

Prime factorization problem is one of relatively few problems called NP-intermediate, which is not considered to 
be NP-complete but no polynomial algorithm has been found. In this proceeding, we propose a statistical mechanical 
formulation of this problem and numerical analysis of them with Monte Carlo algorithm. The statistical mechanical 
modeling would reveal new features lying under the computational hardness of the problem through the structure of its 
phase space landscape [1-3]. In particular, one might determine the computational complexity class of the problem with 
an analysis of the existence and the order of phase transition phenomena of the model. 

Furthermore, in terms of practicality, there have been several cases which the computational hardness of NP-hard 
problems are overcome by probabilistic algorithms. Therefore, also in practical point of view, it attracts our attention 

O \ to studying the prime factorization problem with the form that is tractable by probabilistic algorithms which is now 
conventionally used in the field of statistical mechanics and verifying wheather it can be solved in polynomial time with 
them. 
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<^ • 2.1 Guidelines for Formulation 



2 Models and Methods 



Suppose that an integer N is given. To obtain the prime factorization of N , we will solve an optimization problem 
by Markov chain Monte Carlo (MCMC) simulation with the cost function in the phase space. First, when an integer A^„ 
is located in 2" _1 < N < 2", the number of its prime divisors is bounded by n. In other words, a number n is defined for 
each N as 

«=riog 2 JV l, (2.1) 



where \x\ is the minimal integer which is larger than x. Let {dj} be the state in the phase space composed of the set of 
integers dj. 

Second, the cost function should be designed to favor states in which the elements of {d{} are divisors of N„ and to 
take its lowest value when the entire set of {d,} is the prime factorization of N . Using the information of the residues 
we can design such cost function. By definition of the residue, each mod (N ,dt), which is obtained by dividing N by 
each dj, takes the value only in the case that dt is a divisor of N , and becomes positive in the other case. Instead of 
mod (N ,di) itself, we can also adopt following function of d, and mod (N ,di), 

e, = min I mod(N ,di), d{— mod(N ,di) I , (2.2) 

without loss of such property. 

The order of the residue mod (N ,di) or £; is up to that of 0(exp(n)). To formulate a proper statistical mechanical 
model, we would like to keep the extensiveness of the cost function; Hamiltonian, with respect to n. By taking logarithm 
[see Eq.(2.5) below] or using coefficients of /j-adic expansion £,■ [see Eq.(2.9)-(2. 11) below], we can keep such exten- 
siveness. Especially, with the case of using the p-adic expansion coefficient, the extensiveness is naturally guaranteed 
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and its value always becomes integer. Thus it is particularly convenient in order to calculate the statistical mechanical 
quantities. 

2.2 Model Hamiltonian 

The cost function is mainly composed of two contributions H\ and H 2 . H\ comes from the residue terms and H 2 does 
from the difference of the product Yhdi from iV . When representing the prime factorization form by the entire set of dj 
we set i = n and prepare n integers {dj} = {^i, ■ ■ -d n }. Each dt takes the value dt <E {1, ■ ■ ■ ,2"}. 

Thus the detail of the resulting Hamiltonian H w i w i e ({di}) is shown as, 



H who i e ({di} | N ) =H l +H 2 -YM, ( 7 > 



(2.3) 



Hi 



£log(l 



£i 



;=1 



(2.4) 



H 2 = ^(logN - 1 £log(di) 



i=i 



(2.5) 



where M is the number of integers included in {d,} which is larger than 1. Hi takes the value when all c/, become any 
divisors of N . This term works to each dj locally. On the other hand, H 2 takes the value when the product of all d, is 
equal to N () . This term works globally to prohibit the case that all d, takes the value 1. Thus the Hamiltonian which takes 
the value if and only if the full set of prime divisors is realized by {dj}. 



4.5 r 

4 ■ 
3.5 

3 
2.5 

2 
1.5 

I 
0.5 




150 

d 



Fig. 1: The profile of log(l + £,-) introduced in Eq.(2.4) in the 
case of N„ = 235. It indicates that Eq.(2.4) has several local 
minima. 
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Fig. 2: The histogram of the result of the simulation of //,,,/, / e 
with replica exchange Monte Carlo method in the case of 
N = 23057583. The value of y is taken to be in this result. 
The color red, green, blue, and purple represent the nmber ogf 
hit more than 300, 1000, 3000 and 10000 times respectively. At 
sufficiently low temperature, only the case which dj is a divi- 
sor of N is sampled. 



2.3 Another model 

To formulate the problem of prime factorization as ground state searching, there is some arbitrariness in designing of 
Hamiltonian. In practice, it is also efficient to decompose N into two divisors recursively and apply the primality test. In 
this procedure, the Hamiltonian of factorization of A^„ into d\ , d 2 can be written as follows, 



H elem ({d i }\N )=H 1 +H 2 , 

" ( 
#i = L C7(ei,j) + c7(e2,;') 



/=! 



(2.6) 

(2.7) 



H 2 =Y,e{\did 2 -N \,k), 



(2.8) 
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where the function a represents the coefficient of the p-adic expansions of the variables, defined as, 

e, = gfffe.jV" 1 , (2.9) 

j 

\d l d 2 -N \ = ^a{\did 2 -N \,k)p k - 1 , (2.10) 

k 

a g {0,-,p-l}. (2.11) 

The ground states of this Hamiltonian does not become the prime factorization of N () unless it is the composite number 
of two prime numbers. But we can investigate the elemental process of the factorization with this Hamiltonian. 

2.4 Searching with Replica exchange Monte Carlo Method 

Changing each dj randomly with each step, we search the prime factorization of N by optimizing the state H w i m i e ({dj} \N () ) 
or H e t em ({dj}\N ) with certain condition. We have implemented the rule of the transitions in the phase space by repre- 
senting each dj with p-adic expansion similar to Eq.(2.9)-(2.11) in the cases of the cost function. But in this case we 
adopted following form of the expansion, 

di = Y + Y^a{d t -\,j)q j -\ (2.12) 

j 

a e {0,---,q-l}, (2.13) 

so that each dj does not take the value 0. We note q instead of p to avoid the confusion. Thus the phase space is divided- 
and-conquered by Potts (Ising) like variables a. Transitions in the phase space are performed by shifting (or flipping) the 
value of each a. For example, when we adopt q = 2, the above two Hamiltonians //„ / IO / e and H e \ em are described with n 2 
and 2« degrees of freedom respectively. 

In the potential energy landscape of // w /, / e or H e [ em , like that of spin glass models, there are several local minima 
[see Fig.l]. Even with probabilistic sampling, in ordinary way, the random walker can easily become trapped in each 
minima. To avoid these trap, it requires heating and annealing of the system. Thus we perform simulations with several 
temperature in parallel with exchanging [5] the correspondence of each walker and temperature. When a replica is in 
temperature T\, the state is sampled as an instance in distribution P(T, T\). With some certain condition of transition 
probability, we can keep the canonical stationary distribution in each temperature. 

Replica exchange (or some other extended ensemble) Monte Carlo methods are applied to several optimization or 
constraint satisfaction problems branching from spin glass, including NP-hard problems, and powerful tool for estimating 
expectation values with less systematic errors, finding the optimal solution, computing entropy or free energy, counting 
the number of solutions of these models. 

3 Result 

3.1 Behavior of Computational Cost 

We numerically observed the first passage time Xfi rs t, the Monte Carlo step that the walker of the MCMC simulation 
visits the state of the correct factorization for the first time, and its dependence on the system size n both for H w / W i e 
and H e ) em . Various samples of N are generated by multiplying two prime numbers which are randomly choosed but 
moderately close to each other. 

First we start from the explanation of the results of the simulations of H w i 10 { e . Fig. 3 shows the dependence of T/,„, on 
n = \og 2 N a . Here the results are obtained with 7=0. The green points shows the average over 10 independent samples 



for each iV , The average is taken as exp ( logT/,„( j and the dispersion exp \/ log Tf/ r ,sf — (logT/j>^ ) indicates a 

measure of the variation. Due to the large variation, it is difficult to distinguish between the exponential dependence and 
power-law dependence with large exponent from these results. But even if the power-law dependence was correct, the 
exponent is estimated as nearly 8. Such a large value is thought to be quantitatively unfamiliar. 

Fig. 4 and 5 show the results of the case of H e [ em . In this case we performed with two different manner of conquering 
the phase space, q = 2 (by Ising variables) and q — 3 (by Potts variables), with same N . The number of degrees of 
freedom and the transitions in the phase space are different between these cases. In both cases the results are obtained 
by averaging over 10 sets of independent simulations for each N . Even though the detailed structure of the phase space 
landscape is different, these results exhibit similar behavior to the case of H w i w i e . They are also seen to exhibit exponential 
dependence on n () and large (still larger) variation in each sample. 
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Fig. 3: The dependecce of Ty,„ ( on n = log 2 iV with the simulation of H w f,„i e . (A):Double-logarithmic plot, (B):Semilogrithmic plot. 
The red points replesents independent 10 samples for each n„ = log 2 N and the green points represents average over them. The error 
bar represents the dispersion obtained from log 1fl TS t of each simulation. It indicates a measure of variation. 
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Fig. 4: The dependecce of Zfj rst on n with the simulation of 
Heiem w ith Q = 2. The means of red points, green points and 
error bars are the same as Fig.3 respectively. Both the average 
and the dispersion are calculated with the same manner. 



Fig. 5: The dependecce of l/,y s , on n„ with the simulation of 
H e iem w ith 9 = 3. The means of red points, green points and 
error bars are the same as Fig.3 respectively. Both the average 
and the dispersion are calculated with the same manner. 



4 Summary and Discussion 



In this proceeding we have proposed statistical mechanical formulations of the problem of factorization of integers. 
We set up two kinds of Hamiltonians and gave rough overviews of the size dependence of their computational cost for 
optimization by replica exchange Monte Carlo method. Though the results are not yet sufficient to obtain quantitative 
conclusion, we observed the behavior which is seemed to indicate that they require exponential computational cost. 
However, it should be noted that several questions and issues still remain. 

First, the roughly yielded results of the first passage time should be improved by extensive investigation. An accurate 
determination of the probability distribution of the first passage times is left for future work. As the system size depen- 
dence of the distribution directly reflects the computational complexity of this problem, it should be precisely computed. 
In the above result about ?/,„,, the dispersion of log Tf,„ t calculated from 10 independent samples is given as a measure 
of the spread of the distribution function, assuming that their distribution is a Gaussian. But the tendency of the variation 
in the figures indecates a possibility that the actual form of the distribution may not be Gaussian. 

Second, it should be emphasized that the number field sieve method is already known as an relatively efficient 
algorithm that achieves <9(exp(«2 )) computational cost. This fact is one of the reasons for that the prime factorization 
problem is expected to be not NP problem. Considering the fact, it is suggested that the formulation in this proceeding 
has not yet reached the level of the maximum possible efficiency in the classical computer. In order to use the above 
results as any evidence about the computational complexity class with classical probabilistic algorithm, it is thought to 
be the most reasonable with the case that is confirmed with the most efficient algorithm. We argue that it is still not 
excluded out the possibility that we can reach the most efficient level of computation amount by the choice of the way 
of dividing-and-conquering the phase space, the transition rule, and the temperature points of each replicas [7]. While, 
it is non-trivial whether we can introduce Markov chain and cost function into the algorithm of the number field sieve 
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method. 

As an another subject of future works, we plan to investigate the static quantities of these statistical mechanical 
models. In the statistical mechanics of spin glass theory, the relationship among the phase transition, computational 
complexity, and the structure of the potential energy landscape are discussed [ 1 , 3,4] . In this direction, a number of studies 
have been investigated mainly around the adaptation of the cavity method to random satisfiability, random graph coloring 
and several other NP-hard problems over this ten and a few years [2, 4, 6] . In the simulations with above Hamiltonians, it 
seemed that the random walker in the phase space had been trapped in several isolated local minima which take the value 
1 or 2 even in the case that N is composed of two prime numbers, when the number of true ground states are order of O ( 1 ) 
in the above Hamiltonians. This property is expected to be related to the rapidly growing computational effort even with 
probabilistic algorithm. It would be effective to observe the density of states focusing on the asymptotic n dependence 
of the low energy tail. The characterics of the density of states can also appear in the temperature dependence of specific 
heat and entropy. By the detailed analysis of the above quantities, one could gain an understanding of the computational 
complexity of this problem. 
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